\[H_o:\] Sqft is not an accurate predictor of Price \[H_a:\] Sqft is influential on Housing Costs \[\alpha= 0.05\]
# Read in and View Data
df <- read.csv('sf_clean.csv')
head(df)
# It
seems evident that Square Footage plays a large in the Rental Rates in
San Francisco.
slr <- lm(price ~ sqft, data = df)
summary(slr)
##
## Call:
## lm(formula = price ~ sqft, data = df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3382.2 -402.9 -58.3 341.7 11644.0
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 935.36946 61.81526 15.13 <2e-16 ***
## sqft 2.72293 0.05693 47.83 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 849.3 on 987 degrees of freedom
## Multiple R-squared: 0.6986, Adjusted R-squared: 0.6983
## F-statistic: 2288 on 1 and 987 DF, p-value: < 2.2e-16
Based on the p-value produced from our linear model, we can reject the null hypothesis at the 5% significance level. There is sufficient evidence to suggest that sqft is a significant predictor in the housing prices in San Francisco.
# Plot
histogram of the Residuals, showing normal distribution around 0
# Create
price prediction model, given random sqft
set.seed(1)
random_sqft <- as.data.frame(
matrix(
round(
runif(n = 30,
min = 500,
max = 5000)
)
)
)
colnames(random_sqft) <- c('sqft')
predicted_price <- predict(slr, newdata = random_sqft)
predicted_model_df <- cbind(predicted_price, random_sqft)
summary(predicted_model_df)
## predicted_price sqft
## Min. : 2460 Min. : 560
## 1st Qu.: 5556 1st Qu.:1697
## Median : 7713 Median :2489
## Mean : 8541 Mean :2793
## 3rd Qu.:11569 3rd Qu.:3905
## Max. :14452 Max. :4964
mlr <- lm(price ~ sqft + hood_district, data = df)
summary(mlr)
##
## Call:
## lm(formula = price ~ sqft + hood_district, data = df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3473.7 -408.6 -60.0 351.6 11579.6
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 724.9008 101.9471 7.111 2.22e-12 ***
## sqft 2.7286 0.0568 48.035 < 2e-16 ***
## hood_district 29.0593 11.2118 2.592 0.00969 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 846.8 on 986 degrees of freedom
## Multiple R-squared: 0.7007, Adjusted R-squared: 0.7
## F-statistic: 1154 on 2 and 986 DF, p-value: < 2.2e-16